This is a simple project aiming to show R skills. Through this project I will share basic skills of loading and formatting data. Multiple tables are joined and grouped in order to present complete summary and explicit statistics. At the end, data are presented in graphics
Used database is provided with R package for flights departed in 2013 from New York City.
## Table flights
## Rows: 336,776
## Columns: 19
## $ year <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2…
## $ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ dep_time <int> 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558, …
## $ sched_dep_time <int> 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600, …
## $ dep_delay <dbl> 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -1…
## $ arr_time <int> 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849,…
## $ sched_arr_time <int> 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851,…
## $ arr_delay <dbl> 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -1…
## $ carrier <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV", "B6", "…
## $ flight <int> 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, 4…
## $ tailnum <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN", "N394…
## $ origin <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR", "LGA",…
## $ dest <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL", "IAD",…
## $ air_time <dbl> 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, 1…
## $ distance <dbl> 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733, …
## $ hour <dbl> 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6…
## $ minute <dbl> 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 0…
## $ time_hour <dttm> 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 0…
## Table airlines
## Rows: 16
## Columns: 2
## $ carrier <chr> "9E", "AA", "AS", "B6", "DL", "EV", "F9", "FL", "HA", "MQ", "O…
## $ name <chr> "Endeavor Air Inc.", "American Airlines Inc.", "Alaska Airline…
## Table airports
## Rows: 1,458
## Columns: 8
## $ faa <chr> "04G", "06A", "06C", "06N", "09J", "0A9", "0G6", "0G7", "0P2", "…
## $ name <chr> "Lansdowne Airport", "Moton Field Municipal Airport", "Schaumbur…
## $ lat <dbl> 41.13047, 32.46057, 41.98934, 41.43191, 31.07447, 36.37122, 41.4…
## $ lon <dbl> -80.61958, -85.68003, -88.10124, -74.39156, -81.42778, -82.17342…
## $ alt <dbl> 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 409, 875, 10…
## $ tz <dbl> -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6, -5, -5, -5, -5, …
## $ dst <chr> "A", "A", "A", "A", "A", "A", "A", "A", "U", "A", "A", "U", "A",…
## $ tzone <chr> "America/New_York", "America/Chicago", "America/Chicago", "Ameri…
## Table planes
## Rows: 3,322
## Columns: 9
## $ tailnum <chr> "N10156", "N102UW", "N103US", "N104UW", "N10575", "N105UW…
## $ year <int> 2004, 1998, 1999, 1999, 2002, 1999, 1999, 1999, 1999, 199…
## $ type <chr> "Fixed wing multi engine", "Fixed wing multi engine", "Fi…
## $ manufacturer <chr> "EMBRAER", "AIRBUS INDUSTRIE", "AIRBUS INDUSTRIE", "AIRBU…
## $ model <chr> "EMB-145XR", "A320-214", "A320-214", "A320-214", "EMB-145…
## $ engines <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ seats <int> 55, 182, 182, 182, 55, 182, 182, 182, 182, 182, 55, 55, 5…
## $ speed <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ engine <chr> "Turbo-fan", "Turbo-fan", "Turbo-fan", "Turbo-fan", "Turb…
## Table weather
## Rows: 26,115
## Columns: 15
## $ origin <chr> "EWR", "EWR", "EWR", "EWR", "EWR", "EWR", "EWR", "EWR", "EW…
## $ year <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,…
## $ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ hour <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, …
## $ temp <dbl> 39.02, 39.02, 39.02, 39.92, 39.02, 37.94, 39.02, 39.92, 39.…
## $ dewp <dbl> 26.06, 26.96, 28.04, 28.04, 28.04, 28.04, 28.04, 28.04, 28.…
## $ humid <dbl> 59.37, 61.63, 64.43, 62.21, 64.43, 67.21, 64.43, 62.21, 62.…
## $ wind_dir <dbl> 270, 250, 240, 250, 260, 240, 240, 250, 260, 260, 260, 330,…
## $ wind_speed <dbl> 10.35702, 8.05546, 11.50780, 12.65858, 12.65858, 11.50780, …
## $ wind_gust <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 20.…
## $ precip <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ pressure <dbl> 1012.0, 1012.3, 1012.5, 1012.2, 1011.9, 1012.4, 1012.2, 101…
## $ visib <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
## $ time_hour <dttm> 2013-01-01 01:00:00, 2013-01-01 02:00:00, 2013-01-01 03:00…
There were 336776 flights. The longest one was of 4983 miles but the shortest one was only 17 miles. The longest time in air was of 695 minutes but the shortest one was only 20 minutes long. Regarding maximum delay it was of 1301 minutes. On the other hand, the earliest departure was 43 minutes before scheduled time.
Flights were served by 16 carriers.
There were 1458 destinations.
Through the New York City airport there passed 3322 different planes. The oldest one was manufactured in 1956 and the newest one was from 2013. The largest plane was designed to carry 450 passengers but the smallest one was only for 2. The average cruising speed was of 236.8 mph.
The strongest wind during take off or landing was 1048 mph but the average one was only 10 mph. The strongest precipitations were of 1,21 inch. Mean visibility was of 9,2 miles but the worst one was of 0,0 miles.
This section present data changes in order to facilitate further analysis.
## [1] 73941
Total quantity of NA values is 73941.
This section present initial grouped statistics. Date in flights table was saved in a new column with date type. According to date, corresponding quarter was assigned to each flight. General statistics were calculated for each quarter.
| quart | mean_dep_delay | mean_arr_delay | mean_air_time | mean_distance | number_flights |
|---|---|---|---|---|---|
| 1 | 11.415210 | 5.857851 | 151.4962 | 1006.869 | 80789 |
| 2 | 15.878225 | 10.310172 | 149.6872 | 1045.553 | 85369 |
| 3 | 13.794974 | 6.400052 | 146.1716 | 1054.259 | 86326 |
| 4 | 9.363125 | 4.956591 | 155.5180 | 1051.178 | 84292 |
The lowest number of flights was made in the first quarter. In the same time, the mean flight distance was the shortest. On the other hand, the highest value of delays was noted in the second quarter.
Short statistics of flights were collected for each carrier. They were arranged according to the mean arrival delay, from the highest to the lowest. As we can see, the highest mean delay were above 20 minutes for two airlines: Frontier Airlines Inc. and AirTran Airways Corporation.
| name | mean_dep_delay | mean_arr_delay | mean_air_time | mean_distance | number_flights |
|---|---|---|---|---|---|
| Frontier Airlines Inc. | 20.215543 | 21.9207048 | 229.59912 | 1620.0000 | 685 |
| AirTran Airways Corporation | 18.726075 | 20.1159055 | 101.14394 | 664.8294 | 3260 |
| ExpressJet Airlines Inc. | 19.955390 | 15.7964311 | 90.07619 | 562.9917 | 54173 |
| Mesa Airlines Inc. | 18.996330 | 15.5569853 | 65.74081 | 375.0333 | 601 |
| SkyWest Airlines Inc. | 12.586207 | 11.9310345 | 83.48276 | 500.8125 | 32 |
| Envoy Air | 10.552041 | 10.7747334 | 91.18025 | 569.5327 | 26397 |
| Southwest Airlines Co. | 17.711744 | 9.6491199 | 147.82481 | 996.2691 | 12275 |
| JetBlue Airways | 13.022522 | 9.4579733 | 151.17717 | 1068.6215 | 54635 |
| Endeavor Air Inc. | 16.725769 | 7.3796692 | 86.78160 | 530.2358 | 18460 |
| United Air Lines Inc. | 12.106073 | 3.5580111 | 211.79135 | 1529.1149 | 58665 |
| US Airways Inc. | 3.782418 | 2.1295951 | 88.57380 | 553.4563 | 20536 |
| Virgin America | 12.869421 | 1.7644644 | 337.00235 | 2499.4822 | 5162 |
| Delta Air Lines Inc. | 9.264504 | 1.6443409 | 173.68880 | 1236.9012 | 48110 |
| American Airlines Inc. | 8.586016 | 0.3642909 | 188.82230 | 1340.2360 | 32729 |
| Hawaiian Airlines Inc. | 4.900585 | -6.9152047 | 623.08772 | 4983.0000 | 342 |
| Alaska Airlines Inc. | 5.804775 | -9.9308886 | 325.61777 | 2402.0000 | 714 |
Having that in mind, we select the flights of those two airlines and we compare wind conditions for the worst 15 flights.
| arr_delay | carrier | origin | dest | time_hour | wind_speed | precip | visib |
|---|---|---|---|---|---|---|---|
| 834 | F9 | LGA | DEN | 2013-02-10 08:00:00 | 8.05546 | 0.00 | 10.0 |
| 572 | FL | LGA | ATL | 2013-09-12 14:00:00 | 13.80936 | 0.00 | 9.0 |
| 551 | FL | LGA | ATL | 2013-07-10 14:00:00 | 12.65858 | 0.00 | 8.0 |
| 505 | FL | LGA | ATL | 2013-09-02 14:00:00 | 4.60312 | 0.01 | 6.0 |
| 461 | FL | LGA | ATL | 2013-05-23 17:00:00 | 10.35702 | 0.00 | 7.0 |
| 436 | FL | LGA | ATL | 2013-03-08 08:00:00 | 19.56326 | 0.04 | 0.5 |
| 433 | FL | LGA | CAK | 2013-12-18 09:00:00 | 9.20624 | 0.00 | 10.0 |
| 410 | F9 | LGA | DEN | 2013-03-08 17:00:00 | 14.96014 | 0.00 | 10.0 |
| 400 | FL | LGA | CAK | 2013-08-12 14:00:00 | 6.90468 | 0.00 | 10.0 |
| 395 | FL | LGA | CAK | 2013-07-10 14:00:00 | 12.65858 | 0.00 | 8.0 |
| 390 | FL | LGA | ATL | 2013-07-10 17:00:00 | 14.96014 | 0.00 | 10.0 |
| 382 | FL | LGA | ATL | 2013-07-07 18:00:00 | 12.65858 | 0.00 | 10.0 |
| 374 | FL | LGA | ATL | 2013-07-23 14:00:00 | 10.35702 | 0.00 | 10.0 |
| 345 | FL | LGA | CAK | 2013-11-10 09:00:00 | 12.65858 | 0.00 | 10.0 |
| 345 | F9 | LGA | DEN | 2013-05-23 13:00:00 | 16.11092 | 0.00 | 10.0 |
This summary shows that for the highest arrival delay in majority of cases the weather was not a problem. The only exception is the flight on 8th march 2023 at 8 AM, when visibility was below 1 mile and wind speed was around 20 mph.
This section presents main graphics.